Goto

Collaborating Authors

 Semiconductors & Electronics


Samsung could pre-load Perplexity AI on its future Galaxy smartphones

Mashable

Samsung users might have Perplexity-powered features on future devices. According to a report from Bloomberg, Samsung is "nearing a wide-ranging deal" to bring Perplexity search capabilities to the upcoming Galaxy S26 series. That could include the Perplexity app pre-loaded on Samsung devices, Perplexity search features within Samsung's web browser, and possibly integrating Perplexity with Samsung's Bixby virtual assistant. Samsung is also reportedly a major investor in Perplexity's latest funding round, which seeks to raise 500 million at a 14 billion valuation, said the outlet. Samsung was early to bring AI features to its devices, claiming the Galaxy S24 series was the "first AI phone."


Perplexity AI coming soon to these Samsung devices - report

ZDNet

Samsung has been offering its mobile customers a robust selection of Galaxy AI features via integration with Google Gemini. A deal with Perplexity AI may soon expand the AI features on Samsung devices. On Sunday, Bloomberg published a report informed by persons close to the matter about a wide-ranging deal between Samsung and AI startup Perplexity AI, which would preload Perplexity's app and assistant on future Samsung devices. Perplexity's AI search engine would also be plugged into the Samsung web browser, giving users easy access to AI-powered browsing. Also: How much energy does AI really use?


Scalable and Effective Arithmetic Tree Generation for Adder and Multiplier Designs, David Z. Pan 2

Neural Information Processing Systems

Across a wide range of hardware scenarios, the computational efficiency and physical size of the arithmetic units significantly influence the speed and footprint of the overall hardware system. Nevertheless, the effectiveness of prior arithmetic design techniques proves inadequate, as they do not sufficiently optimize speed and area, resulting in increased latency and larger module size. To boost computing performance, this work focuses on the two most common and fundamental arithmetic modules, adders and multipliers. We cast the design tasks as singleplayer tree generation games, leveraging reinforcement learning techniques to optimize their arithmetic tree structures. This tree generation formulation allows us to efficiently navigate the vast search space and discover superior arithmetic designs that improve computational efficiency and hardware size within just a few hours.


Short-Dot: Computing Large Linear Transforms Distributedly Using Coded Short Dot Products

Neural Information Processing Systems

Faced with saturation of Moore's law and increasing size and dimension of data, system designers have increasingly resorted to parallel and distributed computing to reduce computation time of machine-learning algorithms. However, distributed computing is often bottle necked by a small fraction of slow processors called "stragglers" that reduce the speed of computation because the fusion node has to wait for all processors to complete their processing. To combat the effect of stragglers, recent literature proposes introducing redundancy in computations across processors, e.g., using repetition-based strategies or erasure codes. The fusion node can exploit this redundancy by completing the computation using outputs from only a subset of the processors, ignoring the stragglers. In this paper, we propose a novel technique - that we call "Short-Dot" - to introduce redundant computations in a coding theory inspired fashion, for computing linear transforms of long vectors. Instead of computing long dot products as required in the original linear transform, we construct a larger number of redundant and short dot products that can be computed more efficiently at individual processors. Further, only a subset of these short dot products are required at the fusion node to finish the computation successfully. We demonstrate through probabilistic analysis as well as experiments on computing clusters that Short-Dot offers significant speed-up compared to existing techniques. We also derive trade-offs between the length of the dot-products and the resilience to stragglers (number of processors required to finish), for any such strategy and compare it to that achieved by our strategy.


Label Free Language Model Routing

Neural Information Processing Systems

Large language models (LLMs) are increasingly used in applications where LLM inputs may span many different tasks. Recent work has found that the choice of LLM is consequential, and different LLMs may be good for different input samples. Prior approaches have thus explored how engineers might select an LLM to use for each sample (i.e.


Samsung nears wide-ranging deal with Perplexity for AI features

The Japan Times

Samsung Electronics is nearing a wide-ranging deal to invest in Perplexity AI and put search technology from the artificial intelligence startup at the forefront of the South Korean company's devices. The two companies are in talks to preload Perplexity's app and assistant on upcoming Samsung devices and integrate the startup's search features into the Samsung web browser, according to people with knowledge of the matter. The firms have also discussed weaving Perplexity's technology into Samsung's Bixby virtual assistant, said the people, who asked not to be identified because the talks are private. Samsung is planning to announce the Perplexity integrations as early as this year, the people said, with the goal of including the service as a default assistant option in the Galaxy S26 phone line that's slated to launch in the first half of 2026. However, the specific details haven't been finalized and could still change.


On the Downstream Performance of Compressed Word Embeddings

Neural Information Processing Systems

Compressing word embeddings is important for deploying NLP models in memoryconstrained settings. However, understanding what makes compressed embeddings perform well on downstream tasks is challenging--existing measures of compression quality often fail to distinguish between embeddings that perform well and those that do not. We thus propose the eigenspace overlap score as a new measure. We relate the eigenspace overlap score to downstream performance by developing generalization bounds for the compressed embeddings in terms of this score, in the context of linear and logistic regression. We then show that we can lower bound the eigenspace overlap score for a simple uniform quantization compression method, helping to explain the strong empirical performance of this method. Finally, we show that by using the eigenspace overlap score as a selection criterion between embeddings drawn from a representative set we compressed, we can efficiently identify the better performing embedding with up to 2 lower selection error rates than the next best measure of compression quality, and avoid the cost of training a model for each task of interest.


Towards Next-Generation Logic Synthesis: A Scalable Neural Circuit Generation Framework Qingyue Yang

Neural Information Processing Systems

Logic Synthesis (LS) aims to generate an optimized logic circuit satisfying a given functionality, which generally consists of circuit translation and optimization. It is a challenging and fundamental combinatorial optimization problem in integrated circuit design. Traditional LS approaches rely on manually designed heuristics to tackle the LS task, while machine learning recently offers a promising approach towards next-generation logic synthesis by neural circuit generation and optimization. In this paper, we first revisit the application of differentiable neural architecture search (DNAS) methods to circuit generation and found from extensive experiments that existing DNAS methods struggle to exactly generate circuits, scale poorly to large circuits, and exhibit high sensitivity to hyper-parameters. Then we provide three major insights for these challenges from extensive empirical analysis: 1) DNAS tends to overfit to too many skip-connections, consequently wasting a significant portion of the network's expressive capabilities; 2) DNAS suffers from the structure bias between the network architecture and the circuit inherent structure, leading to inefficient search; 3) the learning difficulty of different input-output examples varies significantly, leading to severely imbalanced learning. To address these challenges in a systematic way, we propose a novel regularized triangle-shaped circuit network generation framework, which leverages our key insights for completely accurate and scalable circuit generation. Furthermore, we propose an evolutionary algorithm assisted by reinforcement learning agent restarting technique for efficient and effective neural circuit optimization. Extensive experiments on four different circuit benchmarks demonstrate that our method can precisely generate circuits with up to 1200 nodes. Moreover, our synthesized circuits significantly outperform the state-of-the-art results from several competitive winners in IWLS 2022 and 2023 competitions.


SHDocs: A dataset, benchmark, and method to efficiently generate high-quality, real-world specular highlight data with near-perfect alignment

Neural Information Processing Systems

A frequent problem in vision-based reasoning tasks such as object detection and optical character recognition (OCR) is the persistence of specular highlights. Specular highlights appear as bright spots of glare that occur due to the concentrated reflection of light; these spots manifest as image artifacts which occlude computer vision models and are challenging to reconstruct. Despite this, specular highlight removal receives relatively little attention due to the difficulty of acquiring highquality, real-world data. We introduce a method to generate specular highlight data with near-perfect alignment and present SHDocs--a dataset of specular highlights on document images created using our method. Through our benchmark, we demonstrate that our dataset enables us to surpass the performance of state-of-theart specular highlight removal models and downstream OCR tasks. We release our dataset, code, and methods publicly to motivate further exploration of image enhancement for practical computer vision challenges.


Stepping Forward on the Last Mile Chen Feng Qualcomm AI Research

Neural Information Processing Systems

Continuously adapting pre-trained models to local data on resource constrained edge devices is the last mile for model deployment. However, as models increase in size and depth, backpropagation requires a large amount of memory, which becomes prohibitive for edge devices. In addition, most existing low power neural processing engines (e.g., NPUs, DSPs, MCUs, etc.) are designed as fixed-point inference accelerators, without training capabilities. Forward gradients, solely based on directional derivatives computed from two forward calls, have been recently used for model training, with substantial savings in computation and memory. However, the performance of quantized training with fixed-point forward gradients remains unclear. In this paper, we investigate the feasibility of ondevice training using fixed-point forward gradients, by conducting comprehensive experiments across a variety of deep learning benchmark tasks in both vision and audio domains. We propose a series of algorithm enhancements that further reduce the memory footprint, and the accuracy gap compared to backpropagation. An empirical study on how training with forward gradients navigates in the loss landscape is further explored. Our results demonstrate that on the last mile of model customization on edge devices, training with fixed-point forward gradients is a feasible and practical approach.